Evaluating Block Algorithm Variants in LAPACK

نویسندگان

Ed Anderson

Jack J. Dongarra

چکیده

The LAPACK software project currently under development is intended to provide a portable linear algebra library for high performance computers. LAPACK will make use of the Level 1, 2, and 3 BLAS to carry out basic operations. A principal focus of this project is to implement blocked versions of a number of algorithms to take advantage of the greater parallelism and improved data locality of the Level 3 BLAS. In this paper, we describe our work with variants of some of these algorithms and the performance data we have collected. LAPACK is planned to be a collection of Fortran 77 subroutines for the analysis and solution of systems of simultaneous linear algebraic equations, linear least-squares problems, and matrix eigenvalue problems [1]. This project will combine the functionality of LINPACK and EISPACK in a single package, incorporate recent algorithmic improvements, and restructure the algorithms to use the Level 2 and 3 BLAS (Basic Linear Algebra Subprograms) for efficiency on today’s high-performance computers. We are investigating variant versions of many of the routines in LAPACK. The building blocks of the LAPACK library are the BLAS, a set of standard subroutines for the most common operations in linear algebra [2,3,4]. The original set of BLAS, consisting of vector-vector operations, was used in LINPACK. Recently, specifications have been drawn up for matrix-vector operations (Level 2 BLAS) and matrix-matrix operations (Level 3 BLAS) to meet the demands of multiprocessing, vectorization, and hierarchical memory in today’s high-performance computers. In particular, the Level 3 BLAS perform O (n 3 ) operations on O (n 2 ) data elements, which helps to improve the ratio of computation to memory references on machines that have a memory hierarchy. This paper describes some of the block factorization routines in LAPACK. The blocked version calls the Level 3 BLAS and, if necessary, an unblocked version of the algorithm to do the processing within a block. The unblocked version calls only Level 1 and 2 BLAS routines and is called directly from the blocked routine if the user has set the blocksize to 1. The LU decomposition is derived by equating the product of a unit lower triangular matrix L and an upper triangular matrix U to the original matrix A . As an illusH A A A I A 31 A 21 A 11

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reduction of a Regular Matrix Pair (A, B) to Block Hessenberg Triangular Form

An algorithm for reduction of a regular matrix pair (A; B) to block Hessenberg-triangular form is presented. This condensed form Q T (A; B)Z = (H; T), where H and T are block upper Hessenberg and upper triangular, respectively, and Q and Z orthogonal, may serve as a rst step in the solution of the generalized eigenvalue problem Ax = Bx. It is shown how an elementwise algorithm can be reorganize...

متن کامل

LAPACK Working Note ? LAPACK Block Factorization Algorithms on the Intel iPSC / 860 ∗

The aim of this project is to implement the basic factorization routines for solving linear systems of equations and least squares problems from LAPACK—namely, the blocked versions of LU with partial pivoting, QR, and Cholesky on a distributed-memory machine. We discuss our implementation of each of the algorithms and the results we obtained using varying orders of matrices and blocksizes.

متن کامل

Implementation for LAPACK of a Block Algorithm for Matrix 1-Norm Estimation

We describe double precision and complex*16 Fortran 77 implementations, in LAPACK style, of a block matrix 1-norm estimator of Higham and Tisseur. This estimator differs from that underlying the existing LAPACK code, xLACON, in that it iterates with a matrix with t columns, where t ≥ 1 is a parameter, rather than with a vector, and so the basic computational kernel is level 3 BLAS operations. O...

متن کامل

Parallel Block Hessenberg Reduction using Algorithms-By-Tiles for Multicore Architectures Revisited LAPACK Working Note #208

The objective of this paper is to extend and redesign the block matrix reduction applied for the family of two-sided factorizations, introduced by Dongarra et al. [9], to the context of multicore architectures using algorithms-by-tiles. In particular, the Block Hessenberg Reduction is very often used as a pre-processing step in solving dense linear algebra problems, such as the standard eigenva...

متن کامل

Aasen’s Symmetric Indefinite Linear Solvers in LAPACK

Recently, we released two LAPACK subroutines that implement Aasen’s algorithms for solving a symmetric indefinite linear system of equations. The first implementation is based on a partitioned right-looking variant of Aasen’s algorithm (the column-wise left-looking panel factorization, followed by the right-looking trailing submatrix update using the panel). The second implements the two-stage ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1989

Evaluating Block Algorithm Variants in LAPACK

نویسندگان

چکیده

منابع مشابه

Reduction of a Regular Matrix Pair (A, B) to Block Hessenberg Triangular Form

LAPACK Working Note ? LAPACK Block Factorization Algorithms on the Intel iPSC / 860 ∗

Implementation for LAPACK of a Block Algorithm for Matrix 1-Norm Estimation

Parallel Block Hessenberg Reduction using Algorithms-By-Tiles for Multicore Architectures Revisited LAPACK Working Note #208

Aasen’s Symmetric Indefinite Linear Solvers in LAPACK

عنوان ژورنال:

اشتراک گذاری